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(54) Voice browser system 



(57) To provide a browser apparatus with the con- 
tents of data provided on a network in a form of voice 
data, voice data indicating a part orthe whole of the con- 
tents of the data provided on the network is formed and 
stored on a gateway, on the basis of the data. Data is 



formed by adding to the data provided on the network 
an identifier <VOICEOUT...> indicating a location where 
the voice data is stored. This data is provided to the 
browser apparatus. The browser apparatus receives the 
voice data from the location indicated by the identifier. 
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Description 

FIELD OF THE INVENTION 

[0001] The present invention relates to a browser sys- 
tem and the like which realize input/output of information 
performed between a server and a client via a network 
by using voices on the client side. 

BACKGROUND OF THE INVENTION 

[0002] An example of conventional voice browser 
systems having a voice input/output function is a voice- 
controllable computer proposed in Japanese Patent 
Laid-Open No. 10-124293 by which a client performs 
voice synthesis and voice recognition. Unfortunately, a 
voice browser system having this configuration has the 
problem that when a client is implemented by hardware 
such as a portable terminal having small calculation re- 
sources, the processing load on the client is too large 
compared to the resources. 

[0003] Accordingly, voice browser systems which 
synthesize and recognize voices by using hardware dif- 
fo'ont from hardware for implementing a client have 
been invented. An example is a browser system or a 
voice proxy server proposed in Japanese Patent Laid- 
Open No. 11-110186. 

[0004] In the above conventional voice browser sys- 
tem, however, a browser process for displaying data de- 
scribed in a markup language such as HTML is sepa- 
rated from a process for outputting and inputting voices 
by voice synthesis and voice recognition. Therefore, be- 
tween hardware for performing voice synthesis and 
voice recognition and hardware for implementing a cli- 
ent, communication for exchanging voice output data 
and voice input data must be performed in addition to 
communication accomplished by HTTP or the like to ex- 
change data described in HTML or the like. 
[0005] This requires complicated communication 
control and control for synchronizing the individual proc- 
esses and hence makes the construction of a voice 
browser system difficult. In addition, a fire wall which 
prohibits communication except for HTTP communica- 
tion is often formed between a client and a server. Since 
no other communication is possible in this case, a voice 
browser system is difficult to construct. 

SUMMARY OF THE INVENTION 

[0006] It is, therefore, an object of the present inven- 
tion to provide a data processing apparatus and method, 
browser system, browser apparatus, and recording me- 
dium capable of displaying data provided on a network 
and outputting or inputting a voice corresponding to that 
data in a common communication process. 
[0007] According to the present invention, there is 
provided a data processing apparatus for providing a 
browser apparatus with the contents of data provided 



on a network in a form of voice data, comprising means 
for forming, on the basis of the data provided on the net- 
work, voice data indicating a part or the whole of the 
contents of the data, means for storing the formed voice 
5 data, means for forming data by adding to the data pro- 
vided on the network an identifier indicating a location 
where the voice data is stored, and means for providing 
the browser apparatus with the data to which the iden- 
tifier is added. 

10 [0008] According to the present invention, there is 
provided a data processing apparatus for permitting a 
browser apparatus to respond by voice to data provided 
on a network, comprising means for checking whether 
the contents of the data provided on the network include 

15 a content requiring a response from the browser appa- 
ratus, means for forming data by adding to the data pro- 
vided on the network an identifier indicating a recipient 
of the response sent by voice data from the browser ap- 
paratus, and means for providing the browser apparatus 

20 with the data to which the identifier is added. 

[0009] According to the present invention, there is 
provided a browser system comprising a browser appa- 
ratus, a server for providing data to the browser appa- 
ratus via a network, and a data processing apparatus 

25 for providing the browser apparatus with the contents of 
data provided by the server in a form of voice data, 
wherein 

the data processing apparatus comprises means 
30 for forming, on the basis of the data provided by the 
server, voice data indicating a part or the whole of 
the contents of the data, means for storing the 
formed voice data, means for forming data by add- 
ing to the data provided by the server an identifier 
35 indicating a location where the voice data is stored, 
and means for providing the browser apparatus with 
the data to which the identifier is added, and 
the browser apparatus comprises means for acquir- 
ing the voice data from the location indicated by the 
40 identifier and outputting a voice related to the voice 
data. 

[0010] According to the present invention, there is 
provided a browser system comprising a browser appa- 
ls ratus, a server for providing data to the browser appa- 
ratus via a network, and a data processing apparatus 
for permitting the browser apparatus to respond by voice 
to data provided by the server, wherein 

50 the data processing apparatus comprises means 
for checking whether the contents of the data pro- 
vided on the network include a content requiring a 
response from the browser apparatus, means for 
forming data by adding to the data provided by the 

55 server an identifier indicating a recipient of the re- 
sponse sent by voice data from the browser appa- 
ratus, means for providing the browser apparatus 
with the data to which the identifier is added, recog- 
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nizing means for performing voice recognition for 
voice data related to the response, when the voice 
data is supplied from the browser apparatus to the 
recipient, means for forming response data in a 
form suited to the server for receiving the response, 5 
on the basis of the result of recognition by the rec- 
ognizing means, and means for providing the re- 
sponse data to the server, and 
the browser apparatus comprises means for input- 
ting a voice, means for forming voice data on the to 
basis of the input voice, and means for supplying 
the formed voice data to a recipient indicated by the 
identifier. 

[0011] According to the present invention, there is *5 
provided a browser system comprising a browser appa- 
ratus, a server for providing data to the browser appa- 
ratus via a network, and a data processing apparatus 
for providing the contents of data provided by the server 
in a form of voice data to the browser apparatus, and 20 
permitting the browser apparatus to respond by voice to 
data provided by the server, wherein 

the data processing apparatus comprises means 
for forming, on the basis of the data provided by the 2s 
server, voice data indicating a part or the whole of 
the contents of the data, means for storing the 
formed voice data, means for forming data by add- 
ing to the data provided by the server a first identifier 
indicating a location where the voice data is stored, zo 
means for providing the browser apparatus with the 
data to which the first identifier is added, means for 
checking whether the contents of the data provided 
by the server include a content requiring a response 
from the browser apparatus, means for forming da- 35 
ta by adding to the data provided by the server a 
second identifier indicating a recipient of the re- 
sponse sent by voice data from the browser appa- 
ratus, means for providing the browser apparatus 
with the data to which the identifier is added, recog- 40 
nizing means for performing voice recognition for 
, voice data related to the response : when the voice 
data is supplied from the browser apparatus to the 
recipient, means for forming response data in a 
form suited to the server for receiving the response, ^ 
on the basis of the result of recognition by the rec- 
ognizing means, and means for providing the re- 
sponse data to the server, and 
the browser apparatus comprises means for acquir- 
ing the voice data from the location indicated by the s° 
first identifier and outputting a voice related to the 
voice data, means for inputting a voice, means for 
forming voice data on the basis of the input voice, 
and means for supplying the formed voice data to 
a recipient indicated by the second identifier. 55 

[0012] According to the present invention, there is 
provided a data processing method of providing a 



browser apparatus with the contents of data provided 
on a network in a form of voice data, comprising the 
steps of forming, on the basis of the data provided on 
the network, voice data indicating a part or the whole of 
the contents of the data, storing the formed voice data, 
forming data by adding to the data provided on the net- 
work an identifier indicating a location where the voice 
data is stored, and providing the browser apparatus with 
the data to which the identifier is added. 
[0013] According to the present invention, there is 
provided a data processing method of permitting a 
browser apparatus to respond by voice to data provided 
on a network, comprising the steps of checking whether 
the contents of the data provided on the network include 
a content requiring a response from the browser appa- 
ratus, forming data by adding to the data provided on 
the network an identifier indicating a recipient of the re- 
sponse sent by voice data from the browser apparatus, 
and providing the browser apparatus with the data to 
which the identifier is added. 

[0014] According to the present invention, there is 
provided a recording medium recording a program 
which, in order to provide a browser apparatus with the 
contents of data provided on a network in a form of voice 
data, allows a computer to function as means for form- 
ing, on the basis of the data provided on the network, 
voice data indicating a part or the whole of the contents 
of the data, means for storing the formed voice data, 
means for forming data by adding to the data provided 
on the network an identifier indicating a location where 
the voice data is stored, and means for providing the 
browser apparatus with the data to which the identifier 
is added. 

[0015] According to the present invention, there is 
provided a recording medium recording a program 
which, in order to permit a browser apparatus to respond 
by voice to data provided on a network, allows a com- 
puter to function as means for checking whether the 
contents of the data provided on the network have con- 
tents requiring a response from the browser apparatus, 
means for forming data by adding to the data provided 
on the network an identifier indicating a recipient of the 
response sent by voice data from the browser appara- 
tus, and means for providing the browser apparatus with 
the data to which the identifier is added. 
[0016] According to the present invention, there is 
provided a browser apparatus comprising means for in- 
putting a voice, means for forming voice data on the ba- 
sis of the input voice, and means for supplying the 
formed voice data to a recipient indicated by a given 
identifier. 

[0017] According to the present invention, there is 
provided a data processing apparatus capable of com- 
municating with a server and a browser apparatus via a 
network, comprising means for forming, on the basis of 
data provided by the server, voice data indicating a part 
or the whole of the contents of the data, means for stor- 
ing the formed voice data, means for adding to the data 
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provided by the server a first identifier indicating a loca- 
tion where the voice data is stored, means for checking 
whether the contents of the data provided by the server 
include a content requiring a response from the browser 
apparatus, means for further adding, when the contents 
of the data provided by the server have contents requir- 
ing a response, a second identifier indicating a recipient 
of the response to the data to which the first identifier is 
added, means for providing the browser apparatus with 
the data to which the first identifier or the first and sec- 
ond identifiers are added, recognizing means for per- 
forming voice recognition for voice data related to the 
response, when the voice data is supplied from the 
browser apparatus to the recipient, means for forming 
response data in a form suited to the server for receiving 
the response, on the basis of the recognition result by 
the recognizing means, and means for providing the re- 
sponse data to the server. 

[0018] Other features and advantages of the present 
invention will be apparent from the following description 
taken in conjunction with the accompanying drawings, 
in which like reference characters designate the same 
or similar parts throughout the figures thereof. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0019] The accompanying drawings, which are incor- 
porated in and constitute a part of the specification, il- 
lustrate embodiments of the invention and, together with 
the description, serve to explain the principles of the in- 
vention. 

Fig. 1 is a view showing the configuration of a voice 
browser system according to an embodiment of the 
present invention; 

Fig. 2 is a block diagram showing the basic config- 
uration of a client computer 1 01 ; 
Fig. 3 is a view showing an example of client HTTP 
response data; 

Fig. 4 is a view showing an example of client HTTP 
request data; 

Fig. 5 is a block diagram showing the basic config- 
uration of a voice gateway computer 102; 
Fig. 6 is a flow chart showing processing in the voice 
gateway computer 1 02; 

Fig. 7 is a view showing an example of HTTP re- 
sponse data; 

Fig. 8 is a view showing an example of the data con- 
figuration of a next request holding unit 511 when 
the data shown in Fig. 7 is processed; and 
Fig. 9 is a view showing a communication example 
between computers according to the embodiment 
of the present invention. 

DETAILED DESCRIPTION OFTHE PREFERRED 
EMBODIMENT 

[0020] A preferred embodiment of the present inven- 
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tion will now be described in detail in accordance with 
the accompanying drawings. 

[0021] Fig. 1 is a view showing the configuration of a 
voice browser system according to an embodiment of 
5 the present invention. 

[0022] This voice browser system comprises a client 
computer 1 01 , a server computer 1 03, and a voice gate- 
way computer 1 02 connected to each other by a network 
104. The client computer 101 serves as a browser ap- 
10 paratus. The server computer 1 03 provides information 
to the client computer 1 01 . The voice gateway computer 
102 provides the client computer 101 with information 
provided by the server computer 1 03 as voice data. The 
voice gateway computer 102 also realizes a request 
is from the client computer 1 01 to the server computer 1 03 
or response from the server computer 103 to the client 
computer 101 as voice data. 

[0023] The client computer 101, the voice gateway 
computer 1 02, and the server computer 1 03 each have 
20 a communication device and can exchange HTTP mes- 
sage over TCP/IP across the network 104. 
[0024] As the server computer 103, it is possible to 
use a conventional computer called a Web server i.e., 
a computer which executes a Web server program for 
25 receiving an HTTP request and returning, as an HTTP 
response, data which matches the URL of the HTTP re- 
quest. 

[0025] The network 104 is, e.g., the Internet or an in- 
tranet. 

30 [0026] In this voice browser system, the client com- 
puter 101 and the server computer 103 can communi- 
cate via the voice gateway computer 1 02. 
[0027] In this specification, a request from the client 
computer 101 to the server computer 103 is called an 

35 HTTP request. In particular, a request from the client 
computer 1 01 to the voice gateway computer 1 02 is also 
called a client HTTP request, and a corresponding re- 
quest from the voice gateway computer 1 02 to the serv- 
er computer 103 is also called a server HTTP request. 

40 [0028] Furthermore, an offer of information from the 
server computer 103 to the client computer 101 in re- 
sponse to an HTTP request is called an HTTP response. 
In particular, a response to the voice gateway computer 
102 is also called a server HTTP response, and a cor- 

45 responding response from the voice gateway computer 
1 02 to the client computer 1 01 is also called a client HT- 
TP response. 

[0029] The details of the client computer 1 01 and the 
voice gateway computer 102 will be described below. 

50 [0030] The client computer 101 includes input devices 
such as a display, keyboard, and mouse, and voice I/O 
devices such as a loudspeaker and microphone. This 
client computer 1 01 can display data described in HTML 
and provided by the server computer 103, can output a 

55 voice of voice data which represents the contents of the 
data provided by the server computer 1 03 and which is 
supplied from the voice gateway computer 1 02, and can 
form, or input by characters, a client HTTP request con- 
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taining voice data. 
[0031] Fig. 2 is a block diagram showing the basic 
configuration of the client computer 1 01 which functions 
as a browser apparatus. 

[0032] Referring to Fig. 2, a response receiver 201 re- 
ceives an HTTP response. An HTML data analyzer 202 
analyzes HTML data contained in the HTTP response 
received by the response receiver 201 . An .HTML dis- 
play 203 displays data in accordance with the result of 
the analysis by the HTML data analyzer 202. A voice 
output unit 204 outputs a voice of voice data, if any, in 
accordance with the analytical result from the HTML da- 
ta analyzer 202. 

[0033] A direct input unit 205 accepts a user input 
from an input device such as a keyboard. A voice input 
unit 207 accepts a voice input. A designation request 
input unit 206 accepts an operation for designation with 
respect to information provided by the server computer 
103 and displayed on the HTML display 203. A request 
forming unit 208 forms an HTTP request from one or the 
combination of inputs to the direct input unit 205, the 
designation request input unit 206, and the voice input 
unit 207. A request transmitter 209 transmits the HTTP 
request formed by the request forming unit 208. 
[0034] The voice gateway computer 102 functions as 
a data processor. That is, the voice gateway computer 

102 performs voice recognition if a client HTTP request 
transmitted from the client computer 1 01 contains voice 
data. On the basis of the recognition result, the voice 
gateway computer 1 02 forms and transmits a server HT- 
TP request. Also, from HTML data contained in a server 
HTTP response transmitted from the server computer 

1 03 in response to an HTTP request, the voice gateway 
computer 1 02 forms voice data whose voice is to be out- 
put from the client computer 1 01 . The voice gateway 
computer 102 provides this voice data together with the 
HTML data to the client computer 1 01 . 

[0035] Fig. 5 is a block diagram showing the basic 
configuration of the voice gateway computer 102. 
[0036] Referring to Fig. 5, a request receiver 501 re- 
ceives a client HTTP request transmitted from the client 
computer 101. A voice recognition unit 502 performs 
voice recognition if the client HTTP request received by 
the request receiver 501 contains voice data. On the ba- 
sis of the recognition result from the voice recognition 
unit502 : a request converter 503 converts the client HT- 
TP request containing the voice data into a server HTTP 
request having a format suited to the server computer 
103. 

[0037] A request transmitter 504 transmits the server 
HTTP request to the server computer 103. If a corre- 
sponding client HTTP request contains voice data, the 
request transmitter 504 transmits the data converted by 
the request converter 503 to the server computer 103. 
If a correspondingclient HTTP request does not contain 
any voice data, the request transmitter 504 transmits the 
data received by the request receiver 501 to the server 
computer 1 03. A response receiver 505 receives a serv- 
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er HTTP response in response to the server HTTP re- 
quest transmitted by the request transmitter 504. An 
HTML data analyzer 506 analyzes HTML data con- 
tained in the server HTTP response received by the re- 

5 sponse receiver 505. 

[0038] A voice synthesizer 507 forms voice data to be 
output as a voice by using the result of the analysis by 
the HTML data analyzer 506. By using the analytical re- 
sult from the HTML data analyzer 506, an input candi- 

10 date forming unit 508 forms a voice input candidate to 
be accepted next and forms a request to be formed 
when this candidate is input. That is, the input candidate 
forming unit 508 checks whether the information provid- 
ed to the client computer 101 by the server computer 

is 103 has contents which require a response from the cli- 
ent computer 101 . If such contents exist, the input can- 
didate forming unit 508 forms a candidate of the con- 
tents. 

[0039] A recognition grammar forming unit 509 forms 
20 a voice recognition grammar to be used by the voice 
recognition unit 502, from the input candidate formed by 
the input candidate forming unit 508. A recognition 
grammar holding unit 510 holds the recognition gram- 
mar formed by the recognition grammar forming unit 
25 509. A next request holding unit 511 holds a pair of an 
input candidate formed by the input candidate forming 
unit 508 and an HTTP request to be formed when a 
voice input corresponding to the input candidate is sup- 
plied. 

30 [0040] A voice data holding unit 512 holds the voice 
data formed by the voice synthesizer 507. An HTML da- 
ta converter 513 embeds a URL for acquiring the voice 
data held in the voice data holding unit 512 and a URL 
for activating the next voice recognition process, into the 

35 HTML data contained in the response data received by 
the response receiver 505. 

[0041] A response transmitter 514 transmits the 
HTML data formed by the HTML data converter 513 as 
a client HTTP response to the client computer 101 . If 

40 this client HTTP request received by the request receiv- 
er 501 is a request for the voice data held in the voice 
data holding unit 512, a voice data returning unit 515 
returns this voice data as a client HTTP response to the 
client computer 101. 

45 [0042] Fig. 3 shows an example of HTML data con- 
tained in a client HTTP response. In this embodiment, 
two extension tags, i.e., VOICEOUT and VOICEIN, are 
used in addition to the HTML specifications defined as 
HTML 4.0. 

50 [0043] VOICEOUT is a tag as an identifier which indi- 
cates the storage location of voice data formed by the 
voice synthesizer 507, i.e., which indicates the voice da- 
ta holding unit 512. 

[0044] VOICEIN is a tag as an identifier which, when 
55 an HTTP request from the client computer 1 01 contains 
voice data, indicates the recipient of the data, i.e., indi- 
cates the request receiver 501 . 

[0045] When VOICEOUT appears, the client compu- 
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ter 101 acquires voice data in a URL designated by the 
attribute of VOICEOUT by issuing another HTTP re- 
quest. The client computer 101 outputs the acquired 
voice data from a voice output device such as a loud- 
speaker. 

[0046] The VOICEIN tag designates a base URL 
which is a base of an HTTP request to be formed when 
a voice is input to a voice input device such as a micro- 
phone of the client computer 1 01 . 
[0047] Processing in the client computer 101 will be 
explained below by using the above example. 
[0048] The response receiver 201 receives an HTTP 
response containing the HTML data shown in Fig. 3, as 
an HTTP response to a certain HTTP request. The pro- 
cedure of the reception is analogous to that of a con- 
ventional browser apparatus. 

[0049] The HTML data analyzer202 performs general 
HTML data analysis except that data necessary for op- 
erations are extracted even for the VOICEOUT and VOI- 
CEIN tags. The HTML display 203 displays information 
on the basis of the HTML data similar to a conventional 
browser apparatus. 

[0050] When the VOICEOUT tag is analyzed, the 
voice output unit 204 transmits an HTTP request which 
requests voice data designated by a URL which is indi- 
cated by the attribute of the tag, and acquires voice data 
contained in the body of a corresponding HTTP re- 
sponse from the voice gateway computer 1 02. 
[0051 ] The voice output unit 204 outputs the acquired 
voice data from a voice output device such as a loud- 
speaker. 

[0052] If a voice is input to a voice input device such 
as a microphone, the voice input unit 207 A/D-converts 
the input to form voice data such as PCM data. Timings 
for determining the start and end points of this PCM data 
are determined by a period during which the voice input 
power exceeds a threshold value or by a period during 
which a certain key is pressed. 

[0053] If there is an input to the voice input unit 207, 
the request forming unit 208 forms a POST request to 
the URL indicated as the attribute of the VOICEIN tag, 
on the basis of the result of the analysis by the HTML 
data analyzer 202. The request forming unit 208 places 
the PCM data formed by the voice input unit 207 into the 
body of the POST request. If a voice is input to the client 
computer 101 which is outputting the HTML data as 
shown in Fig. 3, an HTTP request is formed as shown 
in Fig. 4. The request transmitter 209 transmits the HT- 
TP request formed by the request forming unit 208 to a 
computer designated by the URL of the request, i.e., to 
the post gateway computer 102. 
[0054] An outline of the processing in the voice gate- 
way computer 102 will be described below with refer- 
ence to a flow chart shown in Fig. 6. 
[0055] In the main routine of this processing, the voice 
gateway computer 102 waits for a connection request 
to a port (usually a port No. 80, but it is not limited to this 
one) for receiving HTTP. When a connection request is 



issued from the client computer 1 01 , the voice gateway 
computer 102 establishes a connection and starts the 
processing explained here. In this embodiment, the 
processing is accomplished by the same single thread 
5 as the waiting process for the sake of descriptive sim- 
plicity. However, this process ingjcan also be realized by 
multithread. When this processing is completed, the 
flow returns to the port connection request waiting proc- 
ess. 

10 [0056] In step S601 , the voice gateway computeri 02 
receives an HTTP requestfrom the client computeMOl . 
The flow advances to step S602. 
[0057] In step S602, the voice gateway computer 1 02 
extracts the URL from the HTTP request data. If this 

*5 URL indicates 7out.wav" of the voice gateway computer 
1 02, the flow advances to step S61 3; if not, the flow ad- 
vances to step S603. 

[0058] in step S603, if the URL indicates Tvoicein" of 
the voice gateway computer 102, the flow advances to 

20 step S604; if not, the flow advances to step S606. 

[0059] In step S604, the voice gateway computer 1 02 
extracts the body of the HTTP request and performs 
voice recognition by using the extracted body as voice 
data. This voice recognition is done by using the recog- 

25 nition grammar held in the recognition grammar holding 
unit 51 0. The flow advances to step S605. 
[0060] In step S605, the voice gateway computer 1 02 
extracts the next HTTP request corresponding to the re- 
sult recognized in step S604 from the next request hold- 

30 jng unit 511 . The flow advances to step S607. 

[0061 ] In step S606, the voice gateway computer 1 02 
sets the URL of the HTTP request sent from the client 
computer 101 as the next HTTP request. The flow ad- 
vances to step S607. 

35 [0062] In step S607, the voice gateway computer 1 02 
transmits the next HTTP request to the server (server 
computer 103) indicated by the host portion in the URL 
of the next HTTP request, and obtains an HTTP re- 
sponse. This operation is identical with that of a conven- 

40 tional proxy apparatus. The flow then advances to step 
S608. Fig. 7 is a view showing an example of the HTTP 
response data. 

[0063] In step S608, the voice gateway computer 1 02 
analyzes HTML data in the body of the HTTP response 
45 (server HTTP response) received in step S607. This 
analysis makes it possible to extract the tree structure 
and elements of each tag in the HTML data. The flow 
advances to step S609. 

[0064] In step S609, the voice gateway computer 1 02 
50 uses the analytical result in step S608 to form voice data 
whose voice is to be output from the client computer 
101. That is, the voice gateway computer 102 forms 
voice data by performing voice synthesis for some or all 
texts in the HTML data. A text to be subjected to this 
55 voice synthesis can be arbitrarily determined. In this em- 
bodiment, it is assumed, for the sake of simplicity, that 
voice synthesis is performed for the first P tag element. 
In the data example shown in Fig. 7, synthetic voice data 
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"Select product type." is formed. This synthetic voice da- 
ta is stored as a WAVE-f ormat file in a location looked 
up by 7out.wav". The flow advances to step S610. 
[0065] In step S61 0 , the voice gateway computer 1 02 
outputs this HTML data and the voice data to the client 
computer 1 01 , and, on the basis of the contents of these 
data, forms voice input candidates for a response to be 
accepted from the client computer 1 01 . A voice input to 
be accepted can be arbitrarily determined. In this em- 
bodiment, elements of OPTION tags in a SELECT tag 
are adopted as the input candidates for the sake of sim- 
plicity. In the example shown in Fig. 7, the input candi- 
dates are "Copy'\ "Printer", and "Fax" , 
[0066] The voice gateway computer 1 02 forms a rec- 
ognition grammar for recognizing each word of the input 
candidates. In addition, as the next HTTP request when 
each element is input as a voice, the voice gateway 
computer 1 02 forms a request URL generated when the 
corresponding SELECT tag is selected and the form is 
submitted, and holds this request URL in the next re- 
quest holding unit 511 . 

[0067] Fig. 8 shows an example of the data configu- 
ration in the next request holding unit 51 1 , when the data 
shown in Fig. 7 is processed. Referring to Fig. 8, each 
row corresponds to one input candidate. A column 801 
holds character strings of input candidates. A column 
802 holds the URLs of the next HTTP requests. The flow 
then advances to step S611 . 

[0068] In step S61 1 , the voice gateway computer 1 02 
embeds the VOICEOUT tag and VOICEIN tag in this 
HTML data. In thisembodiment, the URLs of these tags 
are fixed, so the same tag patterns are always embed- 
ded. The voice gateway computer 102 sets the HTML 
data in which the tags are embedded as a client re- 
sponse, and the flow advances to step S613. 
[0069J In step S61 2, the voice gateway computer 1 02 
forms a client response related to the voice data stored 
in step S609 of the immediately preceding processing, 
and the flow advances to step S613. 
[0070] In step S61 3 , the voice gateway computer 1 02 
provides the formed client HTTP response to the client 
computer 101 . After that, the voice gateway computer 
1 02 disconnects from the client computer 1 01 and com- 
pletes the processing. 

[0071] An example of communication between the in- 
dividual computers in this embodiment will be described 
with reference to Fig. 9. 

[0072] Initially, a URL is directly input to the client 
computer 101 (browser), and a client HTTP request for 
http://server/index.html is sent to the voice gateway 
computer 102 (901). Note that this URL is not always 
directly input; the URL is sometimes input by transmit- 
ting an HTTP request for the URL by designating an ob- 
ject having this URL as an anchor on the browser display 
screen. The transmission is similar to that^of a conven- 
tional browser apparatus. 

[0073] Since the HTTP request is for the URL of the 
server 103, the voice gateway computer 102 sends to 
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the server 103 a new server HTTP request for /index, 
html, similar to a conventional proxy apparatus (902). 
[0074] The server 1 03 returns to the voice gateway 
computer 1 02 a server HTTP response containing data 

5 indicated by /index.html in its body (903). Fig. 7 shows 
an example of this HTTP response. 
[0075] On the basis of the received server HTTP re- 
sponse, the voice gateway computer 1 02 forms voice 
data and input candidates and returns to the client com- 

10 puter 101a client HTTP response containing, in its body, 
HTML data (e.g., Fig. 4) in which new tags are embed- 
ded (904). 

[0076] The client computer 101 displays the HTML 
data contained in the body of the received client HTTP 
*5 response, and sends to the voice gateway computer 
1 02 a client HTTP request for voice data (in the example 
shown in Fig. 4, http://gateway.out.wav) indicated by the 
VOICEOUT tag (905). 

[0077] The voice gateway computer 102 returns the 

20 voice data indicated by outwav to the client computer 
101 (906). This voice data is formed and stored before 
the client HTTP response (904) is provided. 
[0078] If a voice is input to the client computer 1 01 , a 
client HTTP request (POST request) containing the 

25 voice data in its body is sent from the client computer 
101 to the voice gateway computer 102 (907). For ex- 
ample, the data shown in Fig. 3 is transmitted. 
[0079] The voice gateway computer 102 performs 
voice recognition for the voice data contained in the 

30 body of the received POST request. If the voice data is 
recognized as "copy", the voice gateway computer 102 
sends to the server 1 03 a server HTTP request for/cgil? 
category=copy, in accordance with the contents of the 
next request holding unit 511 (908). A recognition gram- 

35 mar and the contents of the next request holding unit 
511 used in this processing are formed before the pre- 
vious response (904) is formed. 
[0080] In accordance with the received server HTTP 
request, the server 103 activates a CGI program and 

40 returns a server HTTP response to the voice gateway 
computer 102 (909). 

[0081] In the same manner as when receiving the re- 
sponse 903, the voice gateway computer 102 newly 
forms voice data and a recognition grammar and returns 

45 aclientHTTPresponsetotheclientcomputer101 (910). 
[0082] In the voice browser system of this embodi- 
ment as described above, only the browser (client com- 
puter 1 01 ), the voice gateway (voice gateway computer 
102), and the server (server 103) exist, so communica- 

50 tions need only be performed between them. Therefore, 
it is possible to display data provided by the server and 
to input or output a voice corresponding to that data in 
a common communication process. This simplifies the 
communication management. In addition, all communi- 

55 cations can be performed by HTTP. Hence, communi- 
cations can be performed without any problems even 
when fire walls which generally transmit only HTTP are 
present between the browser, voice gateway, and serv- 
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er. 

[0083] In the above embodiment, the browser, voice 
gateway, and server are implemented by the three com- 
puters, i.e., the client computer, voice gateway compu- 
ter, and server computer. However, the present inven- 
tion is not limited to this embodiment. For example, both 
the voice gateway and server can also be implemented 
by a single computer. 

[0084] Also, in the above embodiment, one browser 
apparatus, one voice gateway apparatus, and one serv- 
er are connected to the network. However, a plurality of 
browser apparatuses, voice gateway apparatuses, and/ 
or server apparatuses can also be connected, and any 
arbitrary apparatus can be used in accordance with data 
to be requested. 

[0085] In the above embodiment, one VOICEOUTtag 
and one VOICEIN tag are added as identifiers in the 
voice gateway computer 102. However, a plurality of 
VOICEOUTtags and/or VOICEIN tags can also be add- 
ed, and one or both of these two types of tags need not 
be added. When a plurality of VOICEOUT tags are to 
be added, different URLs for designating voice data are 
used. When a plurality of VOICEIN tags are to be added, 
a plurality of recognition grammars and a plurality of next 
HTTP request data are prepared, and different URLs for 
designating the attributes of VOICEIN are used. When 
requests containing voice data are sent by these URLs, 
the plurality of recognition grammars and the plurality of 
next HTTP request data prepared are used by discrim- 
inating between them by using the URLs. 
[0086] In the above embodiment, synthetic voice data 
is transferred by a WAVE format, and input voice data 
is transferred as raw PCM data. However, any arbitrary 
voice format can also be used. When a plurality of voice 
formats are to be permitted, data indicating a voice for- 
mat is described in a tag attribute or in an HTTP header. 
[0087] Tag names and attribute names are not re- 
stricted to those used in the above embodiment, so 
some other names can be used. Also, data expressed 
by an attribute can be expressed by a tag, or data ex- 
pressed by a tag can be expressed by an HTTP header. 
That is, arbitrary extension of HTTP and HTML can be 
used. 

[0088] Furthermore, data and programs are not limit- 
ed to HTML and HTTP; it is possible to use data de- 
scribed in another markup language or to use another 
protocol. For example, voice embedding and voice rec- 
ognition analogous to the above embodiment can also 
be performed for data described in WML by using WAP. 
[0089] The above embodiment is achieved by a logic 
circuit for implementing a part or the whole of the above- 
mentioned function, as well as by running a software 
program which implements the function. 
[0090] The preferred embodiment of the present in- 
vention has been explained above. However, the object 
of the present invention can also be achieved by sup- 
plying a storage medium storing program codes of soft- 
ware for implementing the function of the above embod- 
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imentto a system or an apparatus, and reading out and 
executing the program codes stored in the storage me- 
dium by a computer (or a CPU or MPU) of the system 
or apparatus. In this case, the program codes read out 
5 from the storage medium implement the function of the 
present invention, and the storage medium storing 
these program codes constitutes the invention. Also, be- 
sides the function of the above embodiment is imple- 
mented by executing the readout program codes by the 

10 computer, the present invention includes a case where 
an OS (Operating System) or the like running on the 
computer performs a part or the whole of actual process- 
ing in accordance with designations by the program 
codes and thereby implements the function of the above 

15 embodiment. 

[0091] Furthermore, the present invention also in- 
cludes a case where, after the program codes read out 
from the storage medium are written in a memory of a 
function extension board inserted into the computer or 

20 of a function extension unit connected to the computer, 
a CPU or the like of the function extension board or func- 
tion extension unit performs a part or the whole of actual 
processing in accordance with designations by the pro- 
gram codes and thereby implements the function of the 

25 above embodiment. 

[0092] As many apparently widely different embodi- 
ments of the present invention can be made without de- 
parting from the spirit and scope thereof, it is to be un- 
derstood that the invention is not limited to the specific 

30 embodiments thereof except as defined in the claims. 



Claims 

35 1 . A data processing apparatus for providing a brows- 
er apparatus with the contents of data provided on 
a network in a form of voice data, characterized by 
comprising: 

40 means for forming, on the basis of the data pro- 

vided on said network, voice data indicating a 
part or the whole of the contents of the data; 
means for storing the formed voice data; 
means tor forming data by adding to the data 

4 5 provided on said network an identifier indicat- 

ing a location where the voice data is stored; 
and 

means for providing said browser apparatus 
with the data to which the identifier is added. 

50 

2. A data processing apparatus for permitting a brows- 
er apparatus to respond by voice to data provided 
on a network, characterized by comprising: 

55 means for checking whether the contents of the 

data provided on said network include a content 
requiring a response from said browser appa- 
ratus; 
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means for forming data by adding to the data 
provided on said network an identifier indicat- 
ing a recipient of the response sent by voice 
data from said browser apparatus; and 
means for providing said browser apparatus s 
with the data to which the identifier is added. 

3. The apparatus according to claim 2, characterized 
by further comprising recognizing means for per- 
forming voice recognition for voice data related to 10 
the response, when the voice data is supplied from 
said browser apparatus to said recipient. 

4. The apparatus according to claim 3, characterized 

by further comprising: is 

means for forming response data in a form suit- 
ed to a server for receiving the response on said 
network, on the basis of the result of recognition 
by said recognizing means; and 20 
means for providing the response data to said 
server. 

5. The apparatus according to claim 2, characterized 

by further comprising: 25 

. means for forming a recognition grammar for 
recognizing voice data related to each of a plu- 
rality of predetermined items, when the re- 
sponse is to be selected from said plurality of so 
items; 

means for determining, on the basis of the rec- 
ognition grammar, to which item the voice data 
related to the response from said browser ap- 
paratus corresponds; 35 
means for forming response data in a form suit- 
ed to a server for receiving the response on said 
network, in accordance with each item; and 
means for providing the response data to said 
server. *o 



means for forming data by adding to the data 
provided by said server an identifier indicating 
a location where the voice data is stored; and 
means for providing said browser apparatus 
with the data to which the identifier is added, 
and 

said browser apparatus comprises means for 
acquiring the voice data from the location indi- 
cated by the identifier and outputting a voice re- 
lated to the voice data. 

A browser system comprising a browser apparatus, 
a server for providing data to said browser appara- 
tus via a network, and a data processing apparatus 
for permitting the browser apparatus to respond by 
voice to data provided by said server, character- 
ized in that 

said data processing apparatus comprises: 

means for checking whether the contents of the 
data provided on said network include a content 
requiring a response from said browser appa- 
ratus; 

means for forming data by adding to the data 
provided by said server an identifier indicating 
a recipient of the response sent by voice data 
from said browser apparatus; 
means for providing said browser apparatus 
with the data to which the identifier is added; 
recognizing means for performing voice recog- 
nition for voice data related to the response, 
when the voice data is supplied from said 
browser apparatus to said recipient; 
means for forming response data in a form suit- 
ed to said server for receiving the response, on 
the basis of the result of recognition by said rec- 
ognizing means; and 

means for providing the response data to said 
server, and 

said browser apparatus comprises: 



The apparatus according to claim 5, characterized 
in that the response data is formed before data to 
which the identifier is added is provided to said 
browser apparatus. 45 

A browser system comprising a browser apparatus, 
a server for providing data to said browser appara- 
tus via. a network, and a data processing apparatus 
for providing said browser apparatus with the con- so 
tents of data provided by said server in a form of 
voice data, characterized in that 

said data processing apparatus comprises: 

means for forming, on the basis of the data pro- 55 
vided by said server, voice data indicating a 
part or the whole of the contents of the data; 
means for storing the formed voice data; 



means for inputting a voice; 

means for forming voice data on the basis 

of the input voice; and 

means for supplying the formed voice data 

to a recipient indicated by the identifier 

A browser system comprising a browser apparatus, 
a server for providing data to said browser appara- 
tus via a network, and a data processing apparatus 
for providing the contents of data provided by said 
server in a form of voice data to said browser appa- 
ratus, and permitting said browser apparatus to re- 
spond by voice to data provided by said server, 
characterized in that 

said data processing apparatus comprises: 

means for forming, on the basis of the data pro- 
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vided by said server, voice data indicating a 
part or the whole of the contents of the data; 
means for storing the formed voice data; 
means for forming data by adding to the data 
provided by said server a first identifier indicat- 5 
ing a location where the voice data is stored; 
means for providing said browser apparatus 
with the data to which the first identifier is add- 
ed; 

means for checking whether the contents of the 10 
data provided by said server include a content 
requiring a response from said browser appa- 
ratus; 

means for forming data by adding to the data 
provided by said server a second identifier in- 15 
dicating a recipient of the response sent by 
voice data from said browser apparatus; 
means for providing said browser apparatus 
with the data to which the identifier is added; 
recognizing means for performing voice recog- 20 
nition for voice data related to the response, 
when the voice data is supplied from said 
browser apparatus to said recipient; 
means for forming response data in a form suit- 
ed to said server for receiving the response, on 25 
the basis of the result of recognition by said rec- 
ognizing means; and 

means for providing the response data to said 
server, and 

said browser apparatus comprises: 30 



a network, characterized by comprising the steps 
of: 

checking whether the contents of the data pro- 
vided on the network include a content requir- 
ing a response from the browser apparatus; 
forming data by adding to the data provided on 
the network an identifier indicating a recipient 
of the response sent by voice data from the 
browser apparatus; and 
providing the browser apparatus with the data 
to which the identifier is added. 

12. The method according to claim 11, characterized 
by further comprising the recognition step of per- 
forming voice recognition for voice data related to 
the response, when the voice data is supplied from 
the browser apparatus to the recipient. 

13. The method according to claim 12, characterized 

by further comprising the steps of: 

forming response data in a form suited to a 
server for receiving the response on the net- 
work, on the basis of the result of recognition in 
the recognition step; and 
providing the response data to the server. 

14. The method according to claim 11 , characterized 
by further comprising the steps of: 



means for acquiring the voice data from the 
location indicated by the first identifier and 
outputting a voice related to the voice data; 
means for inputting a voice; 35 
means for forming voice data on the basis 
of the input voice; and 
means for supplying the formed voice data 
to a recipient indicated by the second iden- 
tifier. 40 

10. A data processing method of providing a browser 
apparatus with the contents of data provided on a 
network in a form of voice data, characterized by 
comprising the steps of: 45 

forming, on the basis of the data provided on 
the network, voice data indicating a part or the 
whole of the contents of the data; 
storing the formed voice data; so 
forming data by adding to the data provided on 
the network an identifier indicating a location 
where the voice data is stored; and 
providing the browser apparatus with the data 
to which the identifier is added. 55 

11. A data processing method of permitting a browser 
apparatus to respond by voice to data provided on 



forming a recognition grammar for recognizing 
voice data related to each of a plurality of pre- 
determined items, when the response is to be 
selected from the plurality of items; 
determining, on the basis of the recognition 
grammar, to which item the voice data related 
to the response from the browser apparatus 
corresponds; 

forming response data in a form suited to a 
server for receiving the response on the net- 
work, in accordance with each item; and 
providing the response data to the server. 

15. The method according to claim 14, characterized 
in that the response data is formed before data to 
which the identifier is added is provided to the 
browser apparatus. 

16. A recording medium recording a program which, in 
order to provide a browser apparatus with the con- 
tents of data provided on a network in a form of 
voice data, allows a computer to function as: 

means for.forming, on the basis of the data pro- 
vided on said network, voice data indicating a 
part or the whole of the contents of the data; 
means for storing the formed voice data; 
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means for forming data by adding to the data 
provided on said network an identifier indicat- 
ing a location where the voice data is stored; 
and 

means for providing said browser apparatus 
with the data to which the identifier is added. 

17. A recording medium recording a program which, in 
order to permit a browser apparatus to respond by 
voice to data provided on a network, allows a com- 
puter to function as: 

means forchecking whether the contents of the 
data provided on said network have contents 
requiring a response from said browser appa- 
ratus; 

means for forming data by adding to the data 
provided on said network an identifier indicat- 
ing a recipient of the response sent by voice 
data from said browser apparatus; and 
means for providing said browser apparatus 
with the data to which the identifier is added. 

18. The medium according to claim 17, characterized 
in that said program comprises a program which 
allows a computerto function as recognizing means 
for performing voice recognition for voice data re- 
lated to the response, when the voice data is sup- 
plied from said browser apparatus to said recipient. 

19. The medium according to claim 1 8, characterized 
in that said program comprises a program which 
allows a computerto function as: 

means for forming response data in a form suit- 
ed to a server for receiving the response on said 
network, on the basis of the result of recognition 
by said recognizing means; and 
means for providing the response data to said 
server. 

20. The medium according to claim 17, characterized 
in that said program comprises a program which 
allows a computerto function as: 

means for forming a recognition grammar for 
recognizing voice data related to each of a plu- 
rality of predetermined items, when the re- 
sponse is to be selected from said plurality of 
items; 

means for determining, on the basis of the rec- 
ognition grammar, to which item the voice data 
related to the response from said browser ap- 
paratus corresponds; 

means for forming response data in a form suit- 
ed to a server for receiving the response on said 
network, in accordance with each item; and 
-means for.providing the response data to said 



server. 

21. The medium according to claim 20, characterized 
in that the response data is formed before data to 

5 which the identifier is added is provided to said 
browser apparatus. 

22. The apparatus according to claim 1 , characterized 
in that the data provided on said network is de- 

10 scribed in a markup language, and the identifier is 
added to the data as a tag corresponding to the 
markup language. 

23. The apparatus according to claim 2, characterized 
15 in that the data provided on said network is de- 
scribed in a markup language, and the identifier is 
added to the data as a tag corresponding to the 
markup language. 

20 24. The system according to claim 7, characterized in 
that the data provided by said server is described 
in a markup language, and the identifier is added to 
the data as a tag corresponding to the markup lan- 
guage. 

25 

25. The system according to claim 8, characterized in 
that the data provided by said server is described 
in a markup language, and the identifier is added to 
the data as a tag corresponding to the markup Ian- 

30 guage. 

26. The system according to claim 9, characterized in 
that the data provided by said server is described 
in a markup language, and the identifier is added to 

35 the data as a tag corresponding to the markup lan- 
guage. ... 

27. The method according to claim 10, characterized 
in that the data provided on said network is de- 

40 scribed in a markup language, and the identifier is 
added to the data as a tag corresponding to the 
markup language. 

28. The method according to claim 11 , characterized 
45 in that the data provided on said network is de- 
scribed in a markup language, and the identifier is 
added to the data as a tag corresponding to the 
markup language. 

so 29. The medium according to claim 1 6, characterized 
in that the data provided on said network is de- 
scribed in a markup language, and the identifier is 
added to the data as a tag corresponding to the 
markup language. 

30. The medium according to claim 1 7, characterized 
in that the data provided on said network is de- 
. scribed in a markup language, and the identifier is 
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added to the data as a tag corresponding to the 
markup language. 

31 . A browser apparatus charact rized by comprising: 

5 

means for inputting a voice; 

means for forming voice data on the basis of 

the input voice; and 

means for supplying the formed voice data to a 
recipient indicated by a given identifier. 10 

32. The apparatus according to claim 26, character- 
ized by further comprising means for acquiring 
voice data from a location indicated by a given sec- 
ond identifier, and outputting a voice related to the 15 
voice data. 

33. A data processing apparatus capable of communi- 
cating with a server and a browser apparatus via a 
network, characterized by comprising: 20 

means for forming, on the basis of data provid- 
ed by said server, voice data indicating a part 
or the whole of the contents of the data; 
means for storing the formed voice data; 25 
means for adding to the data provided by said 
server a first identifier indicating a location 
where the voice data is stored; 
means for checking whether the contents of the 
data provided by said server include a content 30 
requiring a response from said browser appa- 
ratus; 

means for further adding, when the contents of 
the data provided by said server have contents 
requiring a response, a second identifier indi- 35 
eating a recipient of the response to the data to 
which the first identifier is added; 
means for providing said browser apparatus 
with the data to which the first identifier or the 
first and second identifiers are added; *o 
recognizing means for performing voice recog- 
nition for voice data related to the response, 
when the voice data is supplied from said 
browser apparatus to said recipient; 
means for torming response data in a form suit- «s 
ed to said server for receiving the response, on 
the basis of the recognition result by said rec- 
ognizing means; and 

means for providing the response data to said 
server. 50 

34. Computer executable instructions for causing a 
processor to carry out the method according to any 
of claims 1 0 to 15, 27 or 28. 

55 
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POST http://gateway/voicein HTTP/1.0 
(PCM DATA) 
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HTTP/1.0 200 OK. 
Content-type i ext/html 

<HTML><HEAD> <TITLE> Top <flTTLE> </HEA0x BODY> 
<P> SELECT PRODUCT TYPE </P> 
<FORM action="/cgil"method="get"> 

PRODUCT TYPE <S ELECT name="category"> 

<OPTION value="copy"> COPY 

<OPTION value="printer"selected> PRINTER 

<OPTION value="fax"> FAX 

</SELECTxBR> 

<INPUT type="submit" value="DISPLAY"> 
</FROM> 

<VOICEOUT href="http://gateway/out.wav"> 
<VOICEIN action="http-i/gateway/voicein"> 
</BODY> </HTML> 
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FIG. 7 



HTTP/1.0 200 OK 
Content-type:text/html 

<HTML> <HEAD> <TITLE> Top </TITLE> </HEADx BODY> 
<P> SELECT PRODUCT TYPE </P> 
<FORM action="/cgil"method="get"> 

PRODUCT TYPE <SELECT name="category"> 

<OPTION value="copy"> COPY 

<OPTION value="printer"selected> PRINTER 
• <OPTION value="fax"> FAX 

</SELECTxBR> 

<INPUT type="submif value="DiSPLAY"> 
</FROM> 
</BODY></HTML> 



19- 



EP1 139 335 A2 



FIG. 8 



801 



802 

L 



COPY 


http://server/cgil?category=copy 


PRINTER 


http://server/cgil?category=printer 


FAX 


http://server/cgil?category=fax 



BNSDOCID: <EP 1 139335A2J_> 



20 



EP1 139 335 A2 



FIG. 9 



BROWSER 
(101) 



901 



VOICE GATEWAY 
(102) 



SERVER 
(103) 




BNSDOCID: <EP 1139335A2J_> 



21 



(19) 




Europaisch s Patentamt 
European Patent Office 
Office europeen des brevets 



(12) 



(11) EP 1 139 335 A3 

EUROPEAN PATENT APPLICATION 



(88) Date of publication A3: 

05.12.2001 Bulletin 2001/49 

(43) Date of publication A2: 

04.10.2001 Bulletin 2001/40 

(21) Application number: 01302942.6 

(22) Date of filing: 29.03.2001 


(51) mtci7: G10L 15/26, G10L 15/22, 
H04M 3/493 


(84) Designated Contracting States: 


• Ikeda, Yuji 


AT BE CH CY DE DK ES Fl FR GB GR IE IT LI LU 


Ohta-ku Tokvo IAP\ 


MC NL PT SE TR 


• Ueda, Takaya 


Designated Extension States: 


Ohta-ku, Tokyo (JP) 


AL LT LV MK RO SI 


• Fujii, Kenichi 




Ohta-ku, Tokyo (JP) 


(30) Priority: 31.03.2000 JP 2000099418 






(74) Representative: 


(71) Applicant: CANON KABUSHIKI KAISHA 


Beresford, Keith Denis Lewis et al 


Tokyo (JP) 


BERESFORD & Co. 




High Holborn 


(72) Inventors: 


2-5 Warwick Court 


• Itoh, Fumiaki 


London WC1R5DJ (GB) 


Ohta-ku, Tokyo (JP) 




(54) Voice browser system 



(57) To provide a browser apparatus with the con- 
tents of data provided on a network in a form of voice 
data, voice data indicating a part or the whole of the con- 
tents of the data provided on the network is formed and 
stored on a gateway, on the basis of the data. Data is 



formed by adding to the data provided on the network 
an identifier <VOICEOUT.. .> indicating a location where 
the voice data is stored. This data is provided to the 
browser apparatus. The browser apparatus receives the 
voice data from the location indicated by the identifier. 
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